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Abstract —Zoomable video streaming refers to a new class of 
interactive video applications, where users can zoom into a video 
stream to view a selected region of interest in higher resolutions 
and pan around to move the region of interest. The zoom and 
pan effects are typically achieved by breaking the source video 
into a grid of independently decodable tiles. Streaming the tiles 
to a set of heterogeneous users using broadcast is challenging, as 
users have different link rates and different regions of interest at 
different resolution levels. In this paper, we consider the following 
problem: given the subset of tiles that each user requested, the 
link rate of each user, and the available time slots, at which 
resolution should each tile be sent, to maximize the overall video 
quality received by all users. We design an efficient algorithm to 
solve the problem above, and evaluate the solution on a testbed 
using 10 mobile devices. Our method is able to achieve up to 
12dB improvements over other heuristic methods. 

I. Introduction 

Video streaming is a major Internet service that has been 
widely used to carry both daily and major events (e.g., news, 
TV, sports, etc.). With the proliferation of mobile devices, 
streaming services continue to permeate into our daily lives 
further. Meanwhile, as technology evolves, videos with in¬ 
creasingly higher resolution are becoming available (e.g., 8K 
UHD supports 33 Megapixels). Due to screen size constraints 
(especially on mobile devices) and bandwidth constraints, 
however, video streaming playback is still limited in resolu¬ 
tion. As a result, high resolution videos are typically scaled 
down before transmission, leading to a loss in information. 

To address the mismatch of video resolution between the 
capture device and playback, zoomable video streaming has 
recently been proposed Q3), 03, G3, (2§, (2T). A zoomable 
video supports zoom and pan as two new operations for a user 
to interact with the video. In particular, a user is able to zoom 
into a selected region of interest (Rol) in the video, to view 
the Rol with higher resolution. The user essentially views the 
video through a viewport that defines a rectangular region in 
the high resolution video, from which the displayed video is 
cropped. While zooming in, users can pan around by moving 
the viewport to view different regions in the video. 

In this paper, we are concerned with wireless multicasting 
of zoomable video streams, which can arise in scenarios such 
as interactive TV or live events such as broadcasting lectures 
in campus 0, Q2), (20), stage performances in concert, 
and sports in stadium (including eSports for spectating RTS 
games). Multicast is a natural operation for transmitting these 
contents, as existing studies have reported that users tend to 
zoom into a small clusters of regions in the video G3 with 
substantial overlaps in their views. 


In live zoomable video streaming system (20), multiple 
resolution levels are available for each video stream. For 
a given screen pixel size, the desired resolution level of a 
user depends on the size of the selected region of interest 
(Rol). To stream efficiently, the video is broken into a grid 
of small, independently decodable regions, each is termed as 
a tile in this paper. Instead of transmitting the whole frame, 
a minimum set of tiles covering the selected Rol with the 
desired resolution level is delivered. 

The problem that we consider in this paper is the following: 
given the available time slots for video transmission and the 
selected Rol regions, how to determine, for each tile, at which 
resolution level should it be multicasted to maximize the 
overall utility of all users? There are two challenges in the 
aforementioned problem. First, the scheme has to deal with 
changes in both Rol and the wireless channel that affects 
the supported link rates. Second, the solution has to be 
computational efficient and scalable (with respect to number 
of users/sessions, video qualities, link rate, and time horizon). 

In this work, we propose a novel and efficient algorithm 
to optimally solve this zoomable multicast problem. Our 
algorithm is inspired by several recent works §, (26) that 
look into the design of optimal algorithm for video multicast 
allocation with a focus on heterogeneous link rates. To evaluate 
our algorithm, we implemented the algorithm on a testbed that 
consists of the following key components: (i) mobile clients 
that support zoomable video functions, (ii) video server that 
supports streaming of zoomable video, and (iii) a proxy that 
collects client Rol requests and wireless link conditions, runs 
the resource allocation algorithm, and multicasts the videos 
obtained from the server to the clients. 

The major contributions of our work are as follows: (i) 
We model the zoomable video multicast problem as an op¬ 
timization problem and develop an optimal algorithm that 
decides which resolution of which tile should be transmitted 
at which link rate. The proposed optimal multicast improves 
the average video quality by up to 12dB, 6dB, and 3dB in 
terms of PSNR compared with three baseline schemes, adap¬ 
tive unicast, adaptive multicast, and approximate multicast, 
respectively, (ii) If we consider each tile as an individual video 
session, our proposed algorithm can be applied to the optimal 
allocation of multi-sessions adaptive video streaming as well, 
and has a lower, more practical, running time (grows linearly 
with the number of time slots) than the existing optimal 
allocation algorithms (9|, 1261. (iii) We evaluate our solution 
on a wireless streaming testbed with up to 10 Android phones. 

The rest of the paper is structured as follows. Section [11] 
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discusses the related work. In Section m we review the 
background of tiled zoomable streaming and mixed resolutions 
tiling scheme. Section IV states our maximization problem. 
We present our optimal algorithm in Section [V] The system 
implementations are detailed in Section [VI] and performance 
evaluation results of our algorithm on Android platform are 
presented in Section |VII| The conclusion is made in Section 

elld 


II. Related Work 


A tremendous amount of previous work aims at improving 
video multicast streaming system by dynamically adapting 
video data rates and multicast link rates. In this section, we 
discuss only the most relevant work which can be broadly 
classified into three categories: adaptive video multicast, mul¬ 
ticast link rate adaptation, and adaptive video with multicast 
link rate adaptation. 

Adaptive video multicast. Many adaptive video multicast 
streaming approaches have been proposed to improve the 
performance of video streaming system 0- ED- Liu et al. 
ED present an overview of existing studies and illustrate the 
advantages of adaptive streaming over non-adaptive streaming. 
The problem of rate-adaptive optimized streaming is reduced 
to the error-cost optimized transmission problem in the follow¬ 
ing study 0 - This work also derives a fast practical algorithm 
to solve the formulated optimization problem. Most of these 
works focus on adapting the video rate (quality) with the 
fixed low multicast link rate, which may under-utilize the 
networking resources. 

Multicast link rate adaptation. Recently, multicast link 
rate adaptation mechanisms have been suggested 0 , d), 
03, (24). Instead of using the basic rate, a relatively high 
broadcast rate is used for packet delivery, and FEC schemes 
are leveraged to protect the data from packet losses 1161, |24| . 

Among these work, the most relevant works are DirCast 0 
and Medusa G3- DirCast multicasts packet at the link rate 
of the worst client for each access point (AP) and takes 
into account the rate anomaly problems. We adopt simi¬ 
lar mechanisms. Medusa prioritizes the frames according to 
their importance and transmits the less important frames at 
higher link rates. By utilizing this frame level rate assignment 
heuristic. Medusa achieves higher video quality with limited 
resources. 

Adaptive video with multicast rate adaptation. To further 
improve streaming performance, the last category of research 
jointly adapts video data rate and multicast link rate. This 
category is the most related to our work. Deb et al. [23| 
investigate the utility optimization problem of scalable video 
multicast and prove that this problem is NP-Hard. A greedy 
algorithm is then proposed to schedule the transmissions 
of layers and determine the corresponding modulation and 
coding scheme (MCS) assigned for each transmission. Li et 
al. [0 suggest a pseudo-polynomial algorithm with dynamic 
programming to solve the optimization problem. Most re¬ 
cently, Mu Vi | [26| has been designed to investigate the optimal 
multicast scheduling problem for videos encoded with I, P, and 


B frames. As the computational complexity of the suggested 
algorithm |9j grows quadratically with the number of available 
time slots, it fails to efficiently solve the optimization problem 
with multiple multicast sessions. To reduce the computational 
complexity especially for the case of multiple sessions, a fully 
polynomial time approximation algorithm is presented (TO). 
The approximation factor, however, linearly decreases with the 
number of multicast sessions. 

Our approach falls under the category of adaptive video 
with multicast link rate adaptation. In contrast to previous 
work, we focus on a scenario where each user is interested 
in a subset of video tiles and user interests may partially 
overlapped. Our algorithm can also be easily applied to the 
optimization problems with multiple sessions. 

III. Background of mixed-resolutions tiled 

STREAMING 

In this section, we review the background of mixed reso¬ 
lutions tiling scheme, which is proposed by Wang et al. [(22) . 
Moreover, to evaluate the perceptual quality of this scheme, 
the conducted psychophysical experiment is presented as well. 

A. Mixing Tile Resolutions in Tiled Video 

Zoomable video streaming is typically achieved using a 
technique called tiled streaming, where video frames are 
broken into a grid of tiles (Figure [TJ. We can view the video 
as a three dimensional matrix of tiles. Tiles at the same y- 
x position in the matrix are temporally grouped and coded 
along z axis. The video is encoded into different resolutions 
to support zooming. The zoom-out view corresponds to the 
lowest resolution. As the user zooms in, a minimum set of 
tiles from the higher resolution video covering the Rol region 
is streamed. The location of Rol can be changed by panning, 
while the resolution can be changed by zooming. 

The tiles in the same y-x are decoded together by the 
zoomable player at the client side. The tile groups with 
different y-x positions can be decoded in parallel, each frame 
is formed by the uncompressed tiles with same z position. The 
frame will be displayed in the original order by the zoomable 
player when all the corresponding tiles are uncompressed. 

In existing works 0, |17j, [ |T8) , at the server side, an 
original video is normally encoded into different versions 
(streams): frames of a low-resolution stream are constructed 
from a smaller number of tiles; and frames of higher-resolution 
streams are constructed from a larger number of tiles. At 
the client side, the number of tiles required to cover the 
physical screen resolution is fixed, therefore, the bandwidth 
consumption for each user will be mostly constant. Initially, 
a low resolution version of the video will be sent to users. 
When a user zooms into a Rol within the video, the server 
will first determine a suitable high-resolution stream based 
on the requested Rol size (zoom level). It then selects tiles 
covering the requested Rol from this stream. This mechanism 
allows users to see their regions of interest in detail without 
consuming more bandwidth. 
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Fig. 1. Tile Video 


The afore-mentioned Rol cropping technique performs well 
in small scale networks by unicasting video stream. In one 
of the use case we consider, the video stream is consumed 
by a large number of users within one location (e.g., in a 
concert hall or stadium). To overcome the scalability issues 
with such a large number of users and Rol requests, wireless 
multicast scheme is employed. When the Rol regions from 
multiple users partially overlap, tiles from the overlapped 
regions could be potentially multicasted to all interested users 
to save bandwidth consumption. In zoomable video, different 
users, however, may have different zoom levels (i.e., different 
Rol sizes) and will need tiles from different versions encoded 
at different resolutions, which prevents the potential benefits 
of wireless multicast. 

Instead of fixing tile size, using a fixed number of tiles to 
encode and decode videos could be more effective. At the 
server side, an original video will be encoded into different 
resolution versions, but all versions consist of the same number 
of tiles. The same amount of tiles is required at the client 
side to decode each video frame. Within a frame, however, 
different tiles could come from different resolution streams. 
If a tile comes from a stream with resolution lower/higher 
than requested level, it will be scaled up/down accordingly. In 
zoomable video, when a user zooms into a region of interest 
(Rol) within the video, the server will first determine the 
tiles covering this Rol, and then associate each tile with an 
appropriate stream version, depending on their popularity and 
the resource constraints. 

The proposed mixed resolutions tiling scheme has the 
following two essential advantages in tiled video streaming. 
First, benefiting from the scaling up/down operations for each 
tile, the multicast transmissions are considerably reduced. 
Next, by intelligently allocating resolution version to each 
tile, the mixing resolutions approach may considerably reduce 
bandwidth consumption without impairing much perceived 
video quality. For instance, the popular regions/tiles requested 
by many users could come from high-resolution streams; while 
tiles requested by one or few users could come from a low- 
resolution stream under limited bandwidth condition. 

Although this proposed scheme saves bandwidth, the im¬ 
pairment to the perceived quality is still unclear. Thus, to 
understand if, and at what thresholds, users could notice and 
/or accept the difference between original video and tiled video 
with mixed resolutions, we conducted a psychophysical study 


with 50 participants, which is presented in the remainder of 
this section. 

B. Perceptual Quality Assessment 

Using the method of limits from psychophysics |6|, we mea¬ 
sure two perceptual thresholds - Just Noticeable Difference 
(JND) and Just Unacceptable Difference (JUD) - to understand 
the user perception about the quality of mixed-resolution tiled 
video. The two identified difference thresholds partition the 
quality degradation level (introduced by mixing tile resolu¬ 
tions) into the following three intervals: without noticeable 
quality degradation, with noticeable (but acceptable) quality 
degradation, and with unacceptable quality degradation. 

TABLE I 

The number of pixels in each frame and each tile at different 

RESOLUTION LEVELS. 


level 

frame 

16x9 tiles 

80x45 tiles 

5 

1920x1080 

120x120 

24x24 

4 

1600x900 

lOOx 100 

20x20 

3 

1280x720 

80x80 

16x 16 

2 

960x540 

60x60 

12x12 

1 

640x360 

40x40 

8x8 


1) Setup: The experiments assess the quality of mixed- 
resolution tiled video using three standard HD (1920xl080p) 
test video files, Crowd-Run (dense motion, 50fps), Old-Town- 
Cross (medium motion, 50fps), and Rush-Hour (low motion, 
25fpsQ We have five resolution levels for each video file, 
these levels are labeled from 5 to 1 (Table |I|. The pixels of the 
original video frame at five resolution levels are: 1920x1080, 
1600x900, 1280x720, 960x540, and 640x360. 

In the experiments, we construct mixed-resolution tiled 
video by mixing two resolution levels, where the higher 
resolution level is denoted as Rh and the lower resolution 
level is denoted as Rl . Specifically, given a pair of Rh and 
Rl, we randomly allocate resolution level II n or Rl to each 
tile with equal probability. For any particular pair of Rh and 
R L , we restrict the range of Rh as 3 < R/i < 5 and the 
range of Rr as 1 < Rl < Rh- Figures [2} [3] and [4] show the 
screenshots of mixed-resolution tiled video. 

Since the aspect ratio of the test HD video frame sequences 
is 16:9, we break the video frames into 16x9 tiles by default. 
As a result, each tile size (view region size) is of the 

'Available at http://media.xiph.org/video/dert7 
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(a) Tiles from HD stream 


(b) Mixing tiles from stream levels 5 and 3 


(c) Mixing tiles from stream levels 5 and 1 


Fig. 2. Mixing tile resolutions of Crowd-Run 





(a) Tiles from HD stream 


(b) Mixing tiles from stream levels 5 and 3 


(c) Mixing tiles from stream levels 5 and 1 


Fig. 3. Mixing tile resolutions of Old-Town-Cross 



(a) Tiles from HD stream 



(b) Mixing tiles from stream levels 5 and 3 



(c) Mixing tiles from stream levels 5 and 1 


Fig. 4. Mixing tile resolutions of Rush-Hour 


entire view region. To evaluate the impact of tile size, in 
addition to the default configuration, we generate another set 
of videos where each video frame is broken into 80x45 tiles. 
The number of pixels for a tile at each resolution level is 
shown in Table U) 

2) Procedures: Fifty adult participants were invited to 
participate in our assessment, primarily graduate students and 
research staffs from National University of Singapore. The 
sample consisted of 16 women and 34 men; all had normal 
vision. They were asked to watch the mixed-resolution tiled 
videos onlinifjusing a monitor with full HD display resolution. 

For configurations with 16x9 tiles, we vary the high res¬ 
olution level Rh from 5 to 3, 9 stimuli series are generated 
over three test videos. For configurations with 80x45 tiles, we 
generate stimuli series with Rh = 5. As a result, we have 12 
stimuli series in total, which are shuffled in a random order 
and played. 

For each series, the stimuli is randomly manipulated in 
either an ascending or a descending order, the procedures 

2 Online website is available at: 
http://liubei.ddns.comp.nus.edu.sg/resMix 


R„ 1 R h 2 R„ R„ - 1 



Rating Rating Rating 

(a) Ascending stimuli series 


R h R h - 1 R h R h - 2 R h 1 



Rating Rating Rating 

(b) Descending stimuli series 


Fig. 5. Experiment procedure. The video is composed by tiles with resolution 
level Rh and R] J . The numbers above represent the value of Rl, the first 
video in each pair is a standard tiled video where Rl = Rh , and the second 
video is a mixed-resolution tiled video. 


are depicted in Figure [5] In a stimuli series, we fix the 
high resolution level Rh and vary the low resolution level 
Rl- As shown in the figure, each pair presents a standard 
video where Rl = Rh and a mixed-resolution tiled video. 
After watching the videos in a pair (10s per video), the 
participant is asked to rate the level of the difference between 
two videos. In particular, two questions are asked: (i) is the 
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quality difference noticeable and (ii) is the quality difference 
unacceptable. In the case of ascending series, we increase Rl 
from 1. On each successive trial, we increase Rl by 1 until 
the participant eventually reports the difference is unnoticeable 
or Rl = Rh — 1. If the series is descending, the stimuli 
operates in an opposite direction. We start from Rl = Rh ~ 1 
and gradually decrease Rl until the participant reports the 
difference is unacceptable or R / = 1. 

Using the above procedure, the obtained results fall into 
the following three categories: (i) The noticeable difference 
threshold and unacceptable difference threshold are both de¬ 
tected; (ii) Only the noticeable difference threshold is detected; 
and (iii) Neither noticeable difference threshold nor unaccept¬ 
able difference threshold can be detected. Assuming that we 
have detected the noticeable difference threshold and unac¬ 
ceptable threshold, denoted by 7’y n and Tud, respectively, 
then according to the method of limits (6j, we estimate the Just 
Noticeable Difference threshold as ( T^d + ( T^d + 1)) /2 = 
Tnd+ 0.5. Similarly, we express Just Unacceptable Difference 
threshold as (Tud + ( Tud + l))/2 = Tud + 0.5. For the 
cases where we failed to detect the difference threshold, we 
set the corresponding Just Noticeable/Unacceptable Difference 
threshold to 0. 

3) Results: We first examine the configuration with 16x9 
tiles. Figure [6] depicts the CDF distribution of participants that 
cannot notice any difference between mixed-resolution tiled 
video (5, Rl) and standard tiled HD video (5, 5). The CDF 
distribution of participants that accept the quality difference is 
present in Figure [7] The average measured thresholds of Just 
Noticeable Difference and Just Unacceptable Difference for 
Rh in the range from 5 to 3 are shown in Table [II] and Table 
m respectively. 

TABLE II 

The average Just Noticeable Difference threshold (number 

WITHIN PARENTHESIS IS THE 95% CONFIDENCE INTERVAL VALUE). 


R h 

Crowd-Run 

Old-Town-Cross 

Rush-Hour 

5 

3.68 (±0.52) 

3.25 (±0.47) 

0.81 (±0.23) 

4 

2.74 (±0.39) 

2.31 (±0.34) 

0.24 (±0.10) 

3 

2.09 (±0.30) 

1.73 (±0.26) 

0.11 (±0.06) 


difference with configurations where 3 < Rl < Rh = 5; 
under these configurations, up to 30% bandwidth can be saved 
by mixing tile resolutions. When we construct video from tiles 
at resolution level 5 and 2, almost all participants noticed the 
difference for video Crowd-Run and Old-Town-Cross. 40% 
to 65% of the participants, however, still accept the quality 
difference. 

Impact of Content. With the same configuration, the results 
from Tables [II] and [HI] show a great disparity in the measured 
average JND and JUD across three test videos. Overall, video 
Crowd-Run , which has the highest amount of motion among 
the three test videos, is most sensitive to the resolution mixing, 
as the highest average threshold and the greatest variation 
are detected. Interestingly, video Rush-Hour, which has the 
lowest amount of motion among the three test videos, performs 
remarkably different from others. It is difficult to notice the 
quality difference between the mixed-resolution tiled video 
and the standard version, thus the average measured thresholds 
and the variations are much smaller compared with other test 
videos. 

Gap between JND and JUD Thresholds. For many cases, 
although participants could notice the difference, it is still 
acceptable. Generally, a greater gap value indicates a higher 
video quality tolerance degree when the quality difference is 
noticeable. From the Tables [II] an d[m] we observe a significant 
gap between the average measured JND and JUD thresholds, 
especially for Rh = 5. In particular, the average gap quantities 
for video Crowd-Run and Old-Town-Cross with Rh = 5 are 
1.65 and 1.49, respectively. As the tolerance space is reduced 
with smaller Rh value, the quantity of the threshold gap 
between JND and JUD will be reduced as well, as can be 
seen in both tables. 

TABLE IV 

The average Just Noticeable Difference threshold where 

R h = 5 (NUMBER WITHIN PARENTHESIS IS THE 95% CONFIDENCE 
Interval value). 



Crowd-Run 

Old-Town-Cross 

Rush-Hour 

16x9 

3.68 (±0.52) 

3.25 (±0.47) 

0.81 (±0.23) 

80x45 

3.30 (±0.48) 

3.04 (±0.44) 

0.76 (±0.20) 


TABLE III 

The average Just Unacceptable Difference threshold (number 
within parenthesis is the 95% Confidence Interval value). 


Rh 

Crowd-Run 

Old-Town-Cross 

Rush-Hour 

5 

2.03 (±0.31) 

1.76 (±0.27) 

0 ( 0 ) 

4 

1.64 (±0.26) 

1.28 (± 0 . 21 ) 

0 ( 0 ) 

3 

1.28 (± 0 . 21 ) 

0.69 (±0.14) 

0 ( 0 ) 


TABLE V 

The average Just Unacceptable Difference threshold where 
R h = 5 (NUMBER WITHIN PARENTHESIS IS THE 95% CONFIDENCE 
Interval value). 



Crowd-Run 

Old-Town-Cross 

Rush-Hour 

16x9 

2.03 (±0.31) 

1.76 (±0.27) 

0 ( 0 ) 

80x45 

1.76 (±0.29) 

1.63 (±0.25) 

0 ( 0 ) 


From the results, we can draw the following observations. 

Feasibility of Mixing Tile Resolutions. The measured 
thresholds confirm the feasibility of mixed-resolution tiled 
video. The CDF distribution from Figure [6] implies that we can 
mix tiles with resolution levels 5 and 4 without being noticed 
in most cases. Further, the depicted result from Figure [7] 
indicates that more than 85% participants accept the quality 


Impact of Tile Size. The comparison between the config¬ 
urations with 16x9 tiles and 80x45 tiles is present in Tables 
IV and [V] The threshold values with 80x45 tiles is slightly 


smaller than the corresponding threshold values with 16x9 
tiles, which indicates that the quality degradation introduced 
by mixing resolutions is slightly less obvious for the finer- 
grained tile size (80x45) compared with the coarse-grained 
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Crowd-Run i. i Town-Cross v//a Rush-Hour r^vwi 



Fig. 6. CDF distribution of participants that cannot notice any difference 
between mixed-resolution tiled video (5, R/) and standard HD tiled video 
(5, 5). 


tile size (16x9). The finer-grained tiles, however, are generally 
less efficient in terms of encoding and transmission bandwidth. 
Therefore, we need to balance the trade-off between the 
video quality and the efficiency to obtain an appropriate 
configuration. 

C. Summary 

The subjective assessment demonstrated that in most cases, 
the perceptual quality loss of mixing resolutions in tiled video 
is insignificant, as long as the variance of mixed resolution 
levels is low. From the evaluation results, we can draw the 
following two important observations: 

• In most cases, tiles from 1920xl080p stream and 
1600 x900p stream could be mixed together without 
being noticed; 

• Even when participants could notice quality degradation 
in videos combined with tiles from 1920xl080p stream 
and tiles from 960x540p stream, greater than 80% of 
participants still accept the quality difference for low 
and medium motion videos; and more than 40% of 
participants accept the quality difference for the dense 
motion video. 

This section confirms the feasibility of mixed resolution 
tiling scheme, which will be applied to wireless multicast 
of tiled video streams in the rest of this paper. Instead of 
randomly mixing resolutions of tiles, we are looking into how 
to optimally allocate resolution versions to each tile to better 
utilize the wireless bandwidth and improve overall utilities of 
users. 

IV. Problem Definition 

We now describe an optimization problem to determine 
which tile should be sent at which resolution and at which 
link rate, given the wireless network constraint. Let T be the 
number of slots available on average for the delivery of a single 
frame, where a slot refers to a minimum transmission time 
unit in 802.11 network (e.g., 9g,s in 802.11a). The wireless 


Crowd-Run Town-Cross \///a Rush-Hour iwvi 



Video configuration (R H .R L ) 


Fig. 7. CDF distribution of participants that accept the quality difference 
between mixed-resolution tiled video (5, Rj,) and standard HD tiled video 
(5, 5). 


network supports N r different link rates. Let n be the number 
of users in our system; the physical link rates of these n clients 
are: ri, r-i, ■ ■ ■, r n . Without loss of generality, we assume that 
link rate ry is a non-decreasing function of index i. 

We generate M resolution versions (or levels) for each 
frame, and every frame is broken into N g view regions, each 
view region is termed as a tile (or grid). Instead of using the y- 
x notation in Figure [T] we simply number the tiles 1,2,..., N g 
when we discuss the algorithm. A tile is considered as a 
logical entity - when transmitted, a tile has to have a specific 
resolution level. A tile g with resolution level m (1 < m < M) 
is denoted by g m , the size of which is s'" . The sequence of 
Sg, Sg,..., Sg 1 is strictly increasing. 

The set of tiles in the Rol of user i is denoted as G(r). 
Let Ri be the request resolution level from user i. With 
restricted bandwidth condition, we may not be able to satisfy 
all the user requests. As a result, some tiles may be streamed 
with resolution levels lower than the desired resolution level 
(Ri). To avoid significant perceptual quality loss introduced by 
downgrading tile resolution levels, for user i, we have a lower 
bound Li of the tile resolution levels, which is guaranteed 
to be satisfied. More specifically, for every tile in G (i), the 
resolution level to be decoded (the highest received level) by 
user i should be at least Li. 

Receiving g m at user i yields utility which follows the 
rules below: 

• If g £ G(*), then u r gj = 0 (for all 1 < m < M)\ 

• If g £ G(*) and m < Li, we have u r gl = — cxs; 

• If g £ G(i ) and Li < m < m' < Ri, we have itjjT < 

m'. 

u g,i’ 

• If g £ G (*) and m > R t , we have u™', = u g l. 

For simplicity, we use a tile size-based utility assignment 
mechanism. In particular, u g ' is the maximum achievable 
utility at user i by receiving tile g, the utility assignments 
of receiving other levels are proportional to the corresponding 
tile sizes. The utility function, however, can be any general 
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function (e.g., the PSNR of tiles) subject to the above rules. 

Given the Rol selection and the corresponding utility as¬ 
signment of tiles with each resolution level at each user, the 
objective is to maximize the total utility received by all users 
subject to the total transmission slot constraint. 

Lastly, we discuss the parameter settings for the average 
available time slots T and the tile size with a specific resolu¬ 
tion level. All pixels belonging to the same tile across different 
frames will be encoded as a group of picture (GOP). Due to 
the dependency in a GOP, if we pick a resolution level m for a 
tile, we have to transmit this tile at the same resolution m for 
all frames within the same GOP. In our model, we therefore 
model s™ as the average tile size in a GOP and model the 
average number of time slots needed per frame as T. In the 
implementation, however, the time slots allocated to frames 
in a GOP is proportionally distributed according to the actual 
frame sizes, as there is a considerable diversity in the sizes of 
I, B, and P frames. 


TABLE VI 

Key Notations in the Algorithm 


Notation 

Definition 

T 

Average available time slots for delivering one frame 

n 

Number of users (or clients) 

ri 

The link rate of user i 

N r 

Number of different link rate levels 

M 

The number of available resolution levels 

G(0 

The set of tiles in the Rol of user i 

Ri 

Resolution level requested by user i (Ri >1) 

Li 

Resolution level guaranteed to be satisfied for user i (Li >1) 

N g 

Total number of tiles (or grids) 

9m 

Tile g at resolution level m 

c m 

b 9 

The size (in bytes) of tile g at resolution level m 

U™- 

The utility of tile g at resolution level m for user i 

M{g,i) 

The highest received resolution level of tile g for user i 


The key notations used in this paper are summarized in 
Table [Vj 


V. Optimal Broadcast Algorithm 

This section presents a dynamic programming algorithm to 
solve the utility maximization problem defined in the previous 
section. The solution consists of three major components: 
(i) an algorithm that determines an appropriate quality lower 
bound for each user; (ii) an optimal algorithm for determining 
the link rate and resolution level of a single tile; and (iii) an 
efficient algorithm for determining the link rate and resolution 
level over multiple tiles. 

A. Adaptive Utility Assignment 

The mixture of resolution levels could result in two potential 
issues when the available bandwidth is insufficient to meet the 
requirements from all users. First, as discussed in Section [HI] 
the significant disparity of resolution levels between tiles for 
a user may severely impair the visual perception. Next, the 
utility-oriented optimization algorithm could result in severe 


unfairness. To address these issues, we suggest an algorithm to 
adaptively tune the lower bound L,(l < i < n) of resolution 
level that is guaranteed to be satisfied. 

Recall that Ri is the requested resolution level from user 
i, and L, is the resolution level guaranteed to be satisfied for 
user i among its interested tiles. Given Ri and L,, the rules for 
utility assignment are specified in Section IV It is clear that 
when all requests of users are satisfied, we have Li > R, for 
any 1 < i < n, and the overall utility is optimal. Hence, we 
set Li = Ri at the beginning, then we validate the feasibility 
of current configuration for L, and adapt accordingly. 

We define an indicator variable .x'"\, which takes the value 
of 1 if resolution level m of tile g is transmitted at link 
rate ri, and 0 otherwise. Let AA(g,i) be the maximum 
resolution level of tile g to be received by user i. Since 
user i can only receive the transmissions with link rates not 
higher than ri, the expression of A4(g,i) can be written as 
A4(g,i) = maxjmla;™^ = 1 and 1 <i' < *}. Now we can 
formulate the feasibility validation problem as 


N a 


g=l i= 1 
N g M n 


subject to EEE 

g =1 m =1 i =1 


G 


< T. 


( 1 ) 

( 2 ) 


where u^ 9 ’^ = —oo if g € G(?) and AA(g,i) < Lp, the 
unit of expression s™ /ri is a 802.11 time slot. To obtain an 
appropriate setting of Li, we keep decreasing L, by 1 for all i 
until Inequality (|T]» is feasible subject to time limit constraint 

0 - 

To solve the feasibility problem defined above, we first 
independently calculate the minimum required time slots for 
every tile g (1 < g < N g ) and then simply integrate the 
required time slots across all N g tiles. The total required 
slots should be less or equal than T, if the current lower 
bound requirement (L, ) is achievable. The following paragraph 
presents an algorithm to calculate the minimum required time 
slot for any single tile g. 

For user i, the lower bound requirement of resolution level 
Li can be satisfied by either transmitting at link rate r, or at 
lower link rate tv, where 1 < i’ < i. Define lR g (i,l) as the 
minimum required time slots satisfying non-negative utility 
requirement with users up to i and with resolution level / has 
not been satisfied from users with indexes larger than i. The 
recursive equation for can be written as 


f]}, ifjEGM; 

I St g (i-l,l), if g £ G(i), 

(3) 

where H = maxjl, L, }. The minimum time slots required 
for delivering tile g while satisfying the quality lower bound 
is 3? s (n, 0), which could be easily calculated by leveraging 
recursion 0. Now we are able to simplify the feasibility 
validation problem to ]Cgii ^g{ n > 0) < T. 


*«(*,») = 


min < SR 9 (i — 1, H), 











B. Optimal Allocation for a Single Tile 

For ease of analysis, we begin with designing an optimal 
resource allocation algorithm for a single tile. We denote this 
particular tile as g. The optimal allocation approach determines 
the resolution levels of tile g to be transmitted and the link 
rate for each transmission. 

1) Optimal Allocation Algorithm: Let t (0 < t < T) be the 
total slots available for the transmissions of tile g. The utility 
optimization problem can be formulated as 

n 

E M(g,i) 

U g i , 

2=1 

M n / 

subject to EEte- 

m—1 2=1 E 

As we assume that the users with higher link rate can 
receive all transmissions at lower rates, we have the following 
important observation: for any tile, a higher resolution version 
is always transmitted with higher link rate. By utilizing this 
observation, we have the following definition of the maximum 
utility function. For tile g , define U g {i,m,t) as the optimal 
utility with users 112 , ■ . ■, Ui, with resolution levels up to 
to, and within transmission time limit t. 

Every state U g (i,m,i) falls into category of either user i 
is not interested in tile g or user i is interested in tile g. If 
user i is not interested in tile g (g qL G(i)), the state transition 
equation could be simply written as 

U g (i,m,t) = U g (i — (5) 

It is slightly more complicated to analyze the transitions of 
state U g {i,m,t ) when user i is interested in tile g. There are 
two transition possibilities for this state: 

(i) if the resolution level m of g is not transmitted, the 
recursive function is 

U g (i,m,t) =U g (i,m — 1,2). (6) 

(ii) If the resolution level to is transmitted at link rate level 
ry (i' < i), the recursive function is 

lA g (i,m,t) = max — — 

!<*'<* [ 

(7) 

The terminating conditions for the recursion and the corre¬ 
sponding value assignments are 

U g (i,m,t ) = —00, if f < 0 or to < 0; 

U g (0,m,t) = 0, if t > 0 and to > 0. 

We start the recursion from state U g (n , M, t) with the given 
available time slots t, the highest resolution level M, and user 
n with the highest link rate. The recursion can be solved by 
applying Eqs. (|5]>, ®, and Q. The transition complexity for 
Eqs. 0 and (j6]l are both 0{ 1). Eq. ([ 7 ]) enumerates the user 
link rate for every transmission to attain the optimal transition. 
As a result, the transition complexity for Eq. 0 is 0{n). 
The overall computational complexity of our optimization 
algorithm is 0(n 2 tM), which grows quadratically with n. 




2) Virtual Clustering: This section applies a clustering 
method to make our optimal algorithm scalable with n (num¬ 
ber of users). Since Eq. 0 is the most time consuming 
operation, we will concentrate on analyzing this equation. 

Assuming that a specific link rate ry is used for transmitting 
resolution level to in Eq. 0, all clients with no smaller than 
link rate ry are able to receive this resolution level. Instead of 
enumerating user i', only the distinct link rates are required 
to be considered. As a consequence, we could cluster the 
users with identical link rate to a virtual user in the algorithm. 
The clustering process is achieved by simply integrating the 
corresponding utility values. Specifically, the utility of tile g 
at resolution level to for a virtual user at link rate r could be 
defined as Y^h=i u ™i’ where y = r. 

By clustering, the number of users n is reduced to at most 
N r , which is the maximum number of distinct link rates. 
As the number of link rate levels is noticeably small (8 in 
802.11a 0). with user clustering, our algorithm scales with 
any number of users without considering the frame losses and 
retransmissions. 


C. Optimal Allocation for Multiple Tiles 

This section presents an algorithm that is able to achieve 
the maximum utility by optimally allocating resources over 
all N g tiles. First, we extend the algorithm in Section V-B to 


incorporate multiple tiles. Next, we analyze the computational 
complexity of the algorithm and demonstrate its inefficiency. 
Finally, we reduce the computational overhead of the algo¬ 
rithm to make it more efficient and practical for deployment. 

Given time limit t(g) for tile g, the optimal utility is 
U g {n, M, t(g)), which is calculated in Section 


V-B 


The over¬ 
all system utility is the integrated utility over all N g tiles, the 
optimization problem can be represented as 


N a 

maximize E U g {n, M, t(g)), 
s=i 

subject to E t(g) < T. (8) 

s=i 

From the formulas, we observe that optimization problem ([8]) 
is to optimally distribute the total time slots T to all tiles. 

Define function U{g 1 t) as the maximum utility achieved 
with tiles from 1 to g within time limit t. Enumerating the 
allocated time slots t' for transmissions of tile g yields 


U{g,t) = max {U(g — 1, t — t') + U g {n, M, t')}. (9) 

The maximum system utility is U(N g ,T). This equation is 
employed by Li et al. as well to incorporate the 

allocation of multiple multicast sessions into their optimal 
algorithm. 

We now discuss the complexity of this multiple tiles al¬ 
location algorithm. We precomputed all U g (n, M,t), where 
1 < g < N g and 0 < t < T, the complexity is 0(n 2 TMN g ). 
As shown in Eq. 0. the transition complexity for each state is 
0(T), the complexity of the recursion procedure to calculate 
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lA(N g ,T) is 0{T 2 N g ). Combining the precomputing and the 
recursion complexity gives 0(n 2 TMN g +T 2 N g ) in total. 

The parameters of n (reduced to N r ), M, and N g are con¬ 
stants for a given video, so the computational cost dependents 
on T. Assuming that the video frame rate is 25 fps, the slots 
available on average for a single frame is 40ms ss 4444 slots 
(9 ps per slot in 802.11a). When this value of T is substituted 
into 0(n 2 TMN g + T 2 N g ), the overhead is clearly too large 
to be practical. Therefore, it is essential to further reduce the 
computational complexity. 

The key idea of reducing computational overhead is to trade 
space for algorithm running time. Define the optimal utility 
function U* ( g 7 as 

U*(g,i 1 m 1 t)= max {U(g — 1, t — t') + U g (i, m, t')}. 

( 10 ) 

Same as the analysis for the allocation algorithm of a 
single tile, the category that each state U* ( g , i, m, t ) falls into 
depends on whether user i is interested in tile g. 

If user i is not interested in tile g, substituting Eq. 0 into 
Eq. ( p~0] > yields 

U*(g, i, m, t) = max {U(g — 1, t — t') + U g {i — 1, m, t')} 
= U*(g,i-l,m,t). (11) 


On the other hand, if user i is interested in tile g, by 
substituting Eqs. 0 and 0 into Eq. ©■ we attain Eq. m 
The initial conditions and recursive transitions at boundaries 

for U*(g, i, m, t) are 


U*{g, i, m, t ) = — oo, if t < 0 or m < 0; 

U*(g 1 0, m, t ) = U*(g — 1, n, M, t), if g > 1, t > 0, m > 0; 
U*(0, i,m, t) = 0, iff>0,m>0. 

The recursive Eqs. {U} and ( [T2| clearly illustrate the pro¬ 
cedure to solve the optimal multiple tiles allocation problem. 
The maximum utility is U*(N g ,n,M,T). 


The transition Eq. (Ill consumes 0(1) complexity. Eq. (12 1 


enumerates user id i! instead of time slots, thus the transition 
complexity is 0(n). Taking all transitions into consideration, 
we have a total computational complexity of 0(n 2 TMN g ). 
Here, n can be replaced by N r by clustering users according to 
the available link rate levels. Compared with previous multiple 
tiles allocation algorithm, the computational complexity of 
current algorithm is significantly reduced by a factor of T. In 
the evaluation section, we will demonstrate the effectiveness 
of our optimal algorithm. 


VI. Experimental Setup 

To evaluate our algorithm, we setup the following experi¬ 
mental system. 


A. System Setup 

Our system uses a zoomable video streaming server that 
runs on a Mac Pro with a 3.2GHz Quad-Core processor 
and 8GB memory. The proxy runs on a MacBook with a 
2.9GHz dual-core processor and 8GB memory. The video 



Fig. 8. System Setup. 


server, proxy, and WiFi AP used for multicast are all connected 
through wired Ethernet. The mobile devices, all Samsung 
Galaxy Sill, communicate with the AP using IEEE 802.11a 
operating at 5GHz. 

The AP used supports two Complex IEEE 802.1 labg 
adapters featuring the Atheros AR5414 chipset and runs Open- 
WRT Kamikaze 7.09 with kernel version 2.6.25.16. The driver 
of the wireless adapter used is MadWifi (version 0.9.4). To 
enable packet level rate assignment, we use the Click modular 
router 0 (version 1.6.0). For each video packet transmission, 
we extract the rate value that is specified by the proxy in the 
header of every video packet, then passes the assigned rate 
value to the MadWifi driver. The setup is shown in Figure [8] 

B. Rate Adaptation 

As the WiFi SNR values on the mobile devices are not 
available, we use frame loss as a basis for rate adaptation j2j, 
G3> (23). In particular, we implement History-Aware Robust 
Rate Adaptation Algorithm (HA-RRAA) p5] | that extends the 
work of RRAA (25). 

RRAA uses two parameters. Maximum Tolerable loss 
(MTL) and Opportunistic Rate Increase (ORI), for rate adap¬ 
tation. The corresponding threshold for these parameters are 
denoted by Pmtl and Pori - where Pori < Pmtl ■ RRAA 
measures the frame loss rate P over a period of Estimation 
Window and adapts the link rate as follows. The rate decreases 
to next lower one if P is greater than Pmtl- If P is smaller 
than Pori the rate is increased to next higher one. When P 
is between Pmrl and Pori , the current rate is retained. 

To limit transmissions at the adjacent high loss rates, HA- 
RRAA is suggested G3- HA-RRAA exponentially increases 
the window size of next lower rate upon transmission failure 
of current rate (P > Pmtl ) and reset the window size when 
transmissions of current rate are successful (P < Pmtl)- 
To be responsive to fast channel deterioration as RRAA, the 
algorithm additionally computes the loss over a small window. 
When the loss rate over this small window is greater than 
Pmtl , the current rate is directly moved to the next lower 
rate. 

From our experiments, we observe that the HA-RRAA 
tuning mechanism may still result in the oscillation between 
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IA* (g, i, m, t) = max < £//(# — 1, t — t') + max 


/ s 

U g (i,m — l,i'), max ( U g [i! — 1, m — 1, t' -— ) + V^ u™ 


Ti> 


= max < U* (g, i, m — 1, t), max 


U*{g,i' -l,m-l,t- —) + ^ 


( 12 ) 


two adjacent rates. We slightly modify the algorithm so that the 
window size is halved instead of being reset when transmis¬ 
sions of the current rate is successful. Furthermore, since we 
may broadcast packets at different rates under heterogeneous 
links, a client may receive packets sent at a rate higher than its 
current rate - these packets serve as “free” probes that prevent 
a client from increasing its rate unnecessarily. As a result, our 
rate adaptation is stable and responsive. 

For tractability, packet losses and frame retransmissions are 
not incorporated into our algorithm. Therefore, conservative 
threshold parameters are used in our work. In particular, we 
set Pmtl = 10% and Pori = 3%. The minimum Estimation 
Window size equals the interval between two consecutive 
allocation algorithm runs, this interval is also used as the small 
window to maintain responsiveness. 

C. Video Coding and Streaming 

In the evaluation, we do not need to play the video on 
the mobile devices and hence do not send actual video data. 
Instead, the following is done. 

As depicted in Figure |T| each raw video frame from the 
test video is broken into N g tiles, and the tiles with same 
y-x are encoded using FFmpeg tool (version 1.2.1) with 
H264 codec at the server. During our experiments, instead 
of transmitting the corresponding tiles from the test video, the 
server simply transmits the same number of arbitrary bits as 
the actual video tile. The metadata containing the tile size, y-x 
position, resolution level, and the frame ID for identification, 
is embedded. A client running on the mobile device extracts 
these fields from each received tile and periodically provide 
the reception bitmap to the server. When the transmission is 
over, we gather the reception bitmaps from all the clients, and 
reconstruct the mixed-resolutions video frames with decoded 
tiles at the server side. Here, the lost tiles (indicated by 
bitmaps) in a group of pictures (GOP) are concealed by the 
default method in FFmpeg. 

VII. Evaluation 

In this section, we present the evaluation results of our 
proposed optimal multicast algorithm through extensive ex¬ 
periments using up to 10 mobile devices. 

Compared Algorithms: We compare performance of opti¬ 
mal multicast against the following baseline schemes. These 
schemes use HA-RRAA link adaptation as well. 

Adaptive Unicast (aUnicast): This scheme transmits packets 
using wireless unicast only. To ensure the lowest quality 
(resolution level 1) is received by every user, the algorithm cal¬ 
culates the number of time slots required to transmit every tile 


at resolution level 1. The algorithm then loops through each 
user, and if there is sufficient available time slot remaining, 
the resolution of the tiles transmitted to the user is replaced 
by the desired resolution level. The loop terminates when the 
requests from all users are satisfied or the remaining time slots 
are insufficient for any user. 


Adaptive Multicast (aMulticast): Similar to aUnicast, the 
lowest resolution level 1 is guaranteed for each user and the 
remaining available slots are utilized to upgrade the resolution 
level tile by tile. As in DirCast |:3], the assigned link rate for 
a particular tile is the lowest supported link rate among all 
interested users. As multicast is used, at most one multicast 
transmission is required for any tile. 


Approximation : We apply the approximation method in 1101 
to our maximization problem, where the utility slots in¬ 
stead of the time slots is used as a state dimension in the 
dynamic programming. The approximation factor bound of 
this approach is 1 — sN g . A better approximation factor is 
obtained with a finer-grained utility unit (a smaller e). As 
the computational complexity of the approximation algorithm 
grows quadratically with the number of utility units, the finer- 
grained utility unit significantly increases the computational 
complexity. In our experiment, the same e = 0.2 is used, the 
running time is close to our optimal multicast. 


In our paper, all the above algorithms collect the Rol 
requests and run the allocation algorithm every 2 seconds. The 
average running time of our optimal algorithm is 49.18ms, 
which only incurs 2.5% overhead. 


We measure the peak signal-to-noise ratio (PSNR), a stan¬ 
dard metric for measuring the video quality, and goodput of 
the system to compare the performance of the algorithms. 


Video Setup: We evaluate the algorithms using two stan¬ 
dard HD (1920xl080p) test video files, contr olled -burn (dense 
motion) and rush-hour (low motion 0 Table 
video configurations and data rates. 


VII 


presents the 


Wireless Channels: We place the mobile devices at differ¬ 
ent locations and distances from the AP, to vary the channel 
conditions between the mobile devices and the AP. Table [VTIll 
shows the minimum, maximum, and average achieved link 
rates when there are up to 10 mobile devices. 


Rol Variation: User requests and Rol used in the evalua¬ 
tions are based on the real interaction logs from 10 users who 
have used zoomable video system G3- 


3 Available at http://media.xiph.org/video/derf/ 
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TABLE VII 

The data rate (Mbps) of different resolution levels. 


level 

resolution size 

# tiles 

low rate 3 

medium rate b 

high rate c 

5 

1920x1080 

16x9 

6.2 

10.9 

20.2 

4 

1600x900 

16x9 

4.5 

6.6 

11.1 

3 

1280x720 

16x9 

3.2 

4.6 

8.4 

2 

960x540 

16x9 

2.2 

2.9 

5.0 

1 

640x360 

16x9 

1.2 

1.5 

2.5 


a Rush-hour, compressed using FFmpeg with parameter qp = 25. 
b Controlled-bum, compressed with qp = 25. 
c Controlled-bum, compressed with qp = 22. 


TABLE VIII 

The achieved link rates of mobile users (Mbps). 


# users 

min rate 

max rate 

average rate 

i 

6 

6 

6 

3 

6 

36 

20.0 

5 

6 

36 

21.6 

8 

6 

36 

22.5 

10 

6 

36 

21.0 


A. Baseline Comparison 


The average PSNR with error bars (standard deviation) 
across different users streaming at medium video rate are 
depicted in Figure [9] The corresponding achieved average 
goodput is present in Table IX As the unicast scheme cannot 
fit the lowest resolution level requirement for more than 5 
clients, no data point is presented in this range in the results. 
From the results, we can draw the following observations: 

(i) PSNR gains. The multicast algorithms are able to satisfy 
up to 5 users requests without notable PSNR degradation. On 
the other hand, the video quality with unicast dramatically 
decreases beyond 3 users, and only up to 5 users can be 
supported by adaptive unicast. With more than 5 users, all 
three multicast schemes experience some PSNR loss. The 
optimal multicast, however, considerably outperforms approx¬ 
imation and adaptive multicast under heavy load, with the 
improvements of about 3dB and 5dB in PSNR, respectively. 

(ii) Goodput gains. Due to zooming, the demands between 
different clients are not identical. Hence, the trend in average 
goodput does not strictly follows that of video quality (Table 
m- As predicted from Figure [9] and Table [lx] the multicast 
algorithms outperform unicast when there are more than 3 
users in terms of both PSNR and goodput. When there are 
more than 5 users, the improvements of optimal multicast over 
approximation and adaptive multicast with 10 users are 19% 
and 34%, respectively. 

(iii) Fairness gains. The error bars in Figure [9] indicate that 
our optimal multicast achieves the best fairness among all 


algorithms, due to adaptive utility assignment (Section V-Ai 


in our algorithm. Although a similar allocation method is 
used by adaptive multicast and adaptive unicast, they performs 
remarkably different in terms of fairness. While multicast 
transmission can benefit multiple users, unicast transmission 
does not, which may lead to less fairness among the users. 


TABLE IX 

Average goodput (Mbps) achieved with heterogeneous link 

QUALITIES AT MEDIUM VIDEO RATE 


# users 

aUnicast 

aMulticast 

approximation 

oMulticast 

i 

3.83 

3.79 

3.81 

3.82 

3 

2.95 

3.45 

3.46 

3.41 

5 

1.8 

3.07 

3.05 

3.07 

8 

\ 

2.1 

2.27 

2.56 

10 

\ 

1.99 

2.25 

2.67 


B. Impact of Video Rate 

To evaluate the impact of video data rate (and thus the traffic 
load), we repeat the experiments using a different video with 
a lower rate and the same video encoded with a higher rate. 
We generate low rate and high rate videos in addition to the 
previously used medium rate. The configurations are detailed 
in Table VII Wireless link quality settings are the same to the 
previous section. Figure [TO] and Figure [TT] depict the average 
achieved PSNR for low rate and high rate videos, respectively. 

Figure 10 demonstrates that all four algorithms perform 
better with lighter workload as expected. Specifically, the 
multicast algorithms scale up to 10 users without significant 
quality degradation, and the unicast scheme is able to support 
more clients. 

For higher traffic load, all algorithms perform worst. Com¬ 
pared with other schemes, our optimal algorithm, however, 
still provide relatively fair quality under the higher load. In 
general, if a client does not induce lower link rate or request 
higher resolution level, no additional multicast traffic will be 
introduced. Thus, the video qualities are only slightly reduced 
even as more clients are added to the multicast sessions. 


C. Impact of Rol Similarity 

Intuitively, larger amount of Rol overlapping increases the 
relative performance gap of multicast over unicast. The impact 
of Rol overlapping is evaluated in this section. In order to 
control the amount of overlap, we do not use collected traces 
to simulate Rol variation. Instead, we manually vary the Rol 
sizes and positions so that they can change in a uniform 
and controlled manner. Here, the Rol sizes and the request 
resolution levels of all clients are identical. We vary the 
positions of Rol to generate different similarity. 

To measure the degree of overlapping, we first define the 
popularity of a tile g, p g as the fraction of the number of users 
interested in it. The degree of overlapping for user i is then 
the total popularity of all tiles in G (i), excluding the tiles only 
interested by user i, divided by the number of tiles in G(i). 
We then define similarity as the average overlapping degree 
across all users. We present how PSNR changes with different 
similarity, for 8 users, in Figure [12] 

The relatively stable performance in terms of video quality 
shows that the unicast scheme is not affected by the amount 
of Rol overlap. As expected, the improvement of multicast 
over unicast increases with the increasing Rol similarity. When 
the Rols are identical (all users want the same regions), the 








































12 




± A It 

40 

® YT .. 


■"0. ? * 

1 . ta 

35 


T3 


C 


q.30 



aUnicast .■■■■a. 

25 

aMulticast - 


approximation *-x--* 


oMulticast ■—i—■ 


1 3 5 8 10 


number of users 



Fig. 9. Average PSNR with medium video rate. Fig. 10. Average PSNR with low video rate. Fig. 11. Average PSNR with high video rate. 



Fig. 12. Average PSNR with different similarity. 


improvement is about 12dB in PSNR. Interestingly, with in¬ 
creased similarity value, the PSNR quantities of three multicast 
algorithms converge to an identical point. Such convergence 
is caused by both the decrease in traffic demand and the fact 
that the same data is requested. 

D. Client Mobility 

The previous sections demonstrate the effectiveness of our 
optimal multicast algorithm with stationary clients. In this 
section, we evaluate the performance of our optimal algorithm 
with client mobility. In particular, we keep two clients static, 
the obtained link rates for them are 6Mbps and 36Mbps. One 
additional mobile client starts from a location close to the AP, 
moves away from it, and then moves back. Figure [13] plots 
the average PSNR of the mobile client for every two seconds. 
The movement period is from 40s to 120s. 

In the experiment, the high rate video is used and a segment 
of 20s is played repeatedly. Although, the Rol of each user 
and the allocations are fixed under static condition, the PSNR 
of different frames are different. This disparity is due to the 
fact that sensitivity of different frames with mixed resolution 
tiles are different. The same trend of PSNR variations under 
static conditions can be observed between different playbacks. 

From the figure, we observe that our optimal algorithm 
consistently outperforms two baseline algorithms. The average 
enhancements of our optimal multicast over approximation 
and adaptive multicast are about ldB and 4.5dB, respectively. 



Fig. 13. Average PSNR of the mobile client. 


Moreover, our algorithm can quickly adapt to the link rate 
and the video quality returns quickly to the level similar to 
the static period after the movement (at 120 second). 

VIII. Conclusion 

We have developed and implemented an efficient algorithm 
for multicasting mixed resolution tiles to heterogeneous users, 
for interactive video applications that support zoom and pan. 
Our algorithm optimizes the total utility of all clients and 
achieves significant improvements in video quality: up to a 
3dB improvement over approximation multicast approach, 6dB 
improvement over an adaptive multicast scheme, and 12dB 
improvement over adaptive unicast scheme in our experiment 
settings. Additionally, our approach can be directly applied to 
design an optimal allocation algorithm for a general multi- 
sessions video multicast. In the future, we shall extend this 
work to the scenarios with multiple access points (APs), where 
the AP association mechanism could be exploited to further 
enhance the multicast performance. 
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